Statistical mechanics for natural flocks of birds 
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Interactions among neighboring birds in a flock cause an alignment of their flight directions. We 
show that the minimally structured (maximum entropy) model consistent with these local correla- 
tions correctly predicts the propagation of order throughout entire flocks of starlings, with no free 
parameters. These models are mathematically equivalent to the Heisenberg model of magnetism, 
and define an "energy" for each configuration of flight directions in the flock. Comparing flocks of 
different densities, the range of interactions that contribute to the energy involves a fixed number 
of (topological) neighbors, rather than a fixed (metric) spatial range. Comparing flocks of different 
sizes, the model correctly accounts for the observed scale invariance of long ranged correlations 
among the fluctuations in flight direction. 



The collective behaviour of large groups of animals is 
an imposing natural phenomenon, very hard to cast into 
a systematic theory [l] . Physicists have long hoped that 
such collective behaviours in biological systems could be 
understood in the same way as we understand collec- 
tive behaviour in physics, where statistical mechanics 
provides a bridge between microscopic rules and macro- 
scopic phenomena [U |3] . A natural test case for this ap- 
proach is the emergence of order in a flock of birds: out 
of a network of distributed interactions among the indi- 
viduals, the entire flock spontaneously chooses a unique 
direction in which to fly [4 , much as local interactions 
among individual spins in a ferromagnet lead to a spon- 
taneous magnetization of the system as a whole [5 . De- 
spite detailed development of these ideas [BHS] , there still 
is a gap between theory and experiment. Here we show 
how to bridge this gap, by constructing a maximum en- 
tropy model [TO] based on field data of large flocks of 
starlings pTHT3] . We use this framework to show that 
the effective interactions among birds are local, and that 
the number of interacting neighbors is independent of 
flock density, confirming that interactions are ruled by 
topological rather than metric distance [14]. The statis- 
tical mechanics models that we derive in this way provide 
an essentially complete, parameter free theory for the 
propagation of directional order throughout the flock. 

We consider flocks of European starlings Sturnus vul- 
garis^ as in FigTO^. At any given instant of time, follow- 
ing Refs [TTHT3] T we can attach to each bird i a vector 
velocity iTi, and define the normalized velocity 8^ = Vi/\v]\ 
(Fig[l^). On the hypothesis that flocks have statistically 
stationary states, we can think of all these normalized ve- 
locities as being drawn (jointly) from a probability dis- 
tribution P({5i}). It is not possible to infer this full dis- 
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FIG. 1: The raw data. (A) One snapshot from flocking event 
28-10, N = 1246 birds (see Table|l|in Methods). (B) Instan- 
taneous vector velocities of all the individuals in this snap- 
shot, normalized as S{ = V{/\vi\. 



tribution directly from experiments, since the space of 
states specified by {si} is too large. However, what we 
can measure from field data is the matrix of correlations 
between the normalized velocities, Cij = (^i-Sj). There 
are infinitely many distributions P{{si\) that are consis- 
tent with the measured correlations, but out of all these 
distributions, there is one that has minimal structure: it 
describes a system that is as random as it can be while 
still matching the experimental data. This distribution 
is the one with maximum entropy [TO] . 

It should be emphasized that the maximum entropy 
principle is not a "modeling assumption;" rather it is the 
absence of assumptions. Any other model that accounts 
for the observed correlations will have more structure, 
and hence (explicitly or implicitly) assumes something 
about the nature of the interactions in the flock beyond 
what is required to match the data. Of course the fact 



that the maximum entropy model is minimahy struc- 
tured does not make it correct. It could be, for example, 
that individual birds set their flight direction by comput- 
ing a complicated nonlinear combination of the veloci- 
ties from multiple neighbors, in which case correlations 
among pairs of birds would be insufficient to capture 
the essence of the ordering mechanism. We view the 
maximum entropy model as a powerful starting point, 
from which, as we will see, we can generate detailed and 
testable predictions. 

The maximum entropy distribution consistent with 



the directional correlations Qj is 
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where Z({Jij}) is the appropriate normalization factor, 
or partition function; the derivation follows Ref [10], as 
explained in Appendix [Aj Notice that there is one pa- 
rameter Jij corresponding to each measured element Cij 
of the correlation matrix. To finish the construction of 
the model, we have to adjust the values of the Jij to 
match the experimentally observed Cy , 
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where the symbol (•)_p indicates an average using distri- 
bution P from Eq (IT]), whereas (•)exp indicates an aver- 
age over many experiments. This matching condition is 
equivalent to maximizing the likelihood that the model 
in Eq (IT]) will generate the data from which the correla- 
tions were computed. 

The probability distribution in Eq (IT]) is mathemati- 
cally identical to a model that is familiar from the physics 
of magnets — the Heisenberg model [5 — in which a col- 
lection of spins Si interact so that their energy (or Hamil- 
tonian) is H{{si}) = -(1/2) ^. ^ Jy si-s-; Eq ^ then de- 
scribes the thermal equilibrium or Boltzmann distribu- 
tion at a temperature ksT = 1. In this context, the con- 
stants Jij are the strength of interaction between spins 
i and j, where J > means that these elements tend 
to align. For many physical systems, once we know the 
Hamiltonian there is a plausible dynamics that allows 
the system to relax toward equilbrium, and this is the 
Langevin dynamics 
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where f]i{t) is an independent white noise "force" driv- 
ing each separate degree of freedom. Finding trajectories 
Si{t) that solve Eq ([s]) produces samples that are drawn 
out of the probability distribution in Eq ([T]) . The inter- 
esting point is that this kind of dynamical model also 
is well known in biology: the direction of motion of an 
individual evolves in time according to "social forces" 
reflecting a weighted sum of inputs from neighboring in- 
dividuals, plus noise \^. In this framework, Jij measures 



the strength of the force that tries to align the velocity 
of bird i along the direction defined by bird j . We em- 
phasize that this is not an analogy, but a mathematical 
equivalence. 

In contrast to most networks, the connectivity in a 
flock of birds is intrinsically dynamic — birds move and 
change their neighbors. Thus, it may not make sense 
to talk about matrix of correlations Cij or interactions 
Jij between labelled individuals. On the other hand, the 
continuous and rapid change of neighbors induced by 
motion implies that the interaction Jij between bird i and 
bird j cannot depend directly on their specific identities, 
but only on some function of their relative positions. 

The simplest form of interaction that is independent 
of the birds' identity is one in which each bird interacts 
with the same strength, J, with the same number of 
neighbors, ric (or with all birds within the same radius 
Tc] see below). If the interactions are of this form, then 
Eq ([T]) simplifies to 
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where j G nj, means that bird j belongs to the first ric 
nearest neighbors of i. This is, in fact, the maximum 
entropy model consistent with the average correlation 
among birds within the neighborhood defined by Uc, 
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Biologically, Eq's (E) and ([5| encapsulate the concept 
that the fundamental correlations are between birds and 
their directly interacting neighbors; all more distant cor- 
relations should be derivable from these. As in the more 
general problem, finding the values of J and Uc that re- 
produce the observed correlation Cint is the same as max- 
imizing the probability, or likelihood, that model Eq (|4| 
generates the observed configuration of flight directions 
{si} in a single snapshot. If this is correct, a model that 
appropriately reproduces the fundamental correlations 
up to the scale ric must be able to describe correlations 
on all length scales. 

Importantly, with large flocks we can estimate the cor- 
relations among interacting neighbors from a single snap- 
shot of the birds' flight directions {si}, as indicated in 
the second step of Eq ([5|. In contrast, if we were trying 
to estimate the entire correlation matrix in Eq ([2|, we 
would need as many samples as we have birds in the flock 
(see Appendix Ib]), and we would have to treat explicitly 
the dynamic rearrangements of the interaction network 
during flight. This is an extreme version of the general 
observation that the sampling problems involved in the 
construction of maximum entropy models can be greatly 
reduced if we have prior expectations that constrain the 
structure of the interaction matrix ^151 [16] . 
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FIG. 2: Setting the strength and range of interactions. (A) The predicted strength of correlation, Cint, as a function of the 
interaction strength J, with ric = 11, for the snapshot in Fig[l] Matching the experimental value of Cint = 0.99592 determines 
J = 45.73. Inset: Zoom of the crossing point; error bars are obtained from the model's predictions of fluctuations of Cint(J, ric). 
(B) The log-likelihood of the data per bird ((lnP({si}))exp/A/') as a function of ric with J optimized for each value of ric; 
same snapshot as in (A). There is a clear maximum at ric — H- Inset: the log-likelihood per bird for other snapshots of 
the same flocking event. (C) The inferred value of J for all observed flocks, shown as a function of the flock's size. Each 
point corresponds to an average over all the snapshots of the same flock. Error bars are standard deviations across multiple 
snapshots. (D) As in (C), but for the inferred values of ric- Averaging over all flocks we find ric = 21.2zb 1.7 (black line). 
(E) The inferred value of the topological range ric as a function of the mean inter-bird distance in the flock, for all flocks. 
Error bars are standard deviations across multiple snapshots of the same flock. (F) As in (E), but for the metric range Vc- 
Interactions extend over some fixed metric distance ro, then we expect rtc oi ri/ro and Vc — constant; we find the opposite 
pattern, which is a signature of interactions with a fixed number of topological neighbors [14j . 



We now apply this analysis to data on real flocks of 
starlings. Given a snapshot of the flock, we have the con- 
figuration {^i}, and we need to evaluate the probability 
P({5i}) in Eq Q for any value of J and ric^ then max- 
imize this probability with respect to these parameters 
(see Methods and the Appendices for details of the cal- 
culation). Special care must be devoted to birds on the 
outer edge, or border, of the flock, since these individu- 
als have very asymmetric neighborhoods and may receive 
inputs from signals outside the flock. If we take the flight 
directions of these border birds as given, we can study 
how information propagates through the flock, without 
having to make assumptions about how the boundary is 
different from the interior. Technically, then, we describe 
the flock with Eq Q, but with the flight directions of 
the border birds fixed (again, see Methods and especially 
Appendix [C| for details). 

We proceed as follows: For a single flock, at a given 
instant of time, we compute the correlation dnt pre- 
dicted by the model in Eq Q as a function of the 
coupling strength J, and compare it with the experi- 
mental value of the correlation (Fig[2]A). The equation 
C\nt{J^ nc) = C'f^^^, fixes J {ric) for each value of nc- Then 



we fix the interaction range by looking at the overall 
probability of the data as a function of ric. In general 
there is a clear optimum (Fig^p), from which we finally 
infer the maximum entropy value of both parameters, 
ric and J. We repeat this procedure for every snapshot 
of each flock, and compute the mean and standard de- 
viation of the interaction parameters for each flock over 
time. Alternatively, we can average the log-likelihood 
over many snapshots, and then optimize, and this gives 
equivalent results for J and ric (see Appendix [e|) . 

In Figures |2p and D we report the value of the inter- 
action strength J and of the interaction range ric for all 
flocks, as a function of the flock's spatial size, L. The 
inferred values of J and ric are reproducible, although 
error bars are larger for smaller flocks. In particular, J 
and ric do not show any significant trend with the flocks' 
linear dimensions, with the number of birds, or with the 
density. This did not have to be the case, nor is it in 
any way built in to our framework; for example, if the 
real interactions extended over long distances, then our 
fitting procedure would produce an increase of ric and J 
with the size of the flock. 

In Figure [2tl we also show that the interaction range 
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FIG. 3: Correlation functions predicted by the maximum entropy model vs experiment. The full pair correlation function 
can be written in terms of a longitudinal and a perpendicular component, i.e. (si • Sj) = (spSj") + (tti • ttj). Since the two 
components have different amplitudes, it is convenient to look at them separately. (A) Perpendicular component of the 
correlation, C^(r) = (vfi • vfj), as a function of the distance; the average is performed over all pairs ij separated by distance r. 
Blue diamonds refer to experimental data (for the sample in Fig Tj), red circles to the prediction of the model in Eq Q. The 
dashed line marks the maximum r that contributes to Cint, which is the only input to the model. The correlation function is 
well fitted over all length scales. In particular, the correlation length ^, defined as the distance where the correlation crosses 
zero, is well reproduced by the model. Inset: ^ vs. size of the flock, for all the flocking events; error bars are standard deviations 
across multiple snapshots of the same flocking event. (B) Four-point correlation function C4(ri;r2) = ((vfi-Tfj) (vrk-Tfi)), where 
the pairs ij and kl are as shown in the inset (see also Appendix [G]) . The figure shows the behaviour of C^iri; r2) as a function 
of r2, with ri = 0.5. (C) Longitudinal component of the correlation C^{r) — {s\s\) — aS^, as a function of distance. Note 



that in the spin wave approximation, C (r) 



■ C4(0; 



S . (D) Similarity between the predicted mean value of flight 



direction, (vfi), and real data, for all individual birds in the interior of the flock. The similarity can be quantified through the 
local overlap qi — (tti) • 7ff^^/(|(7fi)||7ff^^|), which is plotted as a function of the distance of the individual from the border. 
Maximal similarity corresponds to ^i = 1. Inset: full distribution P{q) for all the interior birds. 



Tie does not depend on the typical distance between 
neighboring birds, ri, which is closely related to the 
flock's density. Of course, we can run exactly the same 
method using a metric interaction range, Tc, rather than 
a topological Tdinge, 71^ We simply set Jy = J if and only 
if birds i and j lie within Vc meters. When we do this we 
find that the metric range Vc does depend on the near- 
est neighbor distance ri , in contrast with the topological 
range ric (Fig|2^). This result provides strong support 
for the claim put forward in Ref [IT that birds interact 
with a fixed number of neighbors, rather than with all 
the birds within a fixed metric distance. 

Having fixed J and ric by matching the scalar cor- 
relation in the flock, we have no free parameters — 



everything that we calculate now is a parameter free 
prediction. We begin by computing the correlations 
between pairs of birds as a function of their distance, 
C{r) ^ J]]-- Si'Sj 5{r — nj), as shown in FiglsK. There is 
extremely good agreement across the full range of dis- 
tances. As we have seen, our maximum entropy calcula- 
tion finds local interactions, i.e. a relatively small value 
of Tic {ric ^ 20 for flocks of up to thousands birds). This 
implies that the scalar correlation Qnt, used as an exper- 
imental input to the calculation, is the integral of C(r) 
only over a very small interval close to r = 0: only the av- 
erage value of pair correlations at very short distances is 
used as an input to the calculation, whereas all the long 
range part of C{r) is not. Nevertheless, we have very 
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FIG. 4: Maximum entropy analysis for a model of self- 
propelled particles. (A) Inferred value of the parameter J 
vs. microscopic strength of alignment forces used in the sim- 
ulation. (B) Inferred value of ric vs. the true number of in- 
teracting neighbors in the simualtion. Slopes of the lines are 
2.2 and 2.7, respectively. Error bars are standard deviations 
across 45 snapshots of the same simulation. 



good agreement out to the overall extent of the flock it- 
self. This confirms our expectation that a model for the 
local correlations is able to describe correlations on all 
length scales. We draw attention to the fact that the 
apparent correlation length, defined by C{r = C) = ^^ is 
predicted to scale with the linear size of the flock (^ ex L, 
inset to FiglsK), as observed experimentally. 

Correlations exist not just between pairs of birds, but 
among larger n-tuplets. In Fig [3^ we consider the corre- 
lations among quadruplets of birds. Although these cor- 
relations are small, their shape is nontrivial and quite 
noiseless. The model, which takes only local pairwise 
correlations as input, reproduces very accurately these 
4-body correlations, including a non-monotonic depen- 
dence on distance, out to distances comparable to the 
full extent of the flock. Again, this is not a fit, but a 
parameter free prediction. 

Finally, instead of measuring correlations, we can ask 
the model to predict the actual flight directions of indi- 
vidual birds in the interior of the flock, given the direc- 
tions chosen by birds on the borders. This can't work 
perfectly, since the model predicts that individual birds 
have an element of randomness in their choice of direc- 
tion relative to their neighbors, but as shown in Fig [3)1) 
the overlap between predicted and observed directions 
is very good, not just for birds close to the border but 
throughout the entire "thickness" of the flock. 

The maximum entropy model has a mechanistic inter- 
pretation, from Eq (Is]), in terms of social forces driving 
the alignment of the flight directions. Given the suc- 
cess of the model in predicting the propagation of order 
throughout the flock, it is interesting to ask whether we 
can take this mechanistic interpretation seriously. As a 
test, we have simulated a population of self-propelled 
particles in three dimensions moving according to social 
forces that tend to align each particle with the average 
direction of its neighbors, as described by Eqs. ([9| and 
(10) in the Methods. We then compared the simula- 

) to the values (J" 



tion parameters (J^^"^,<"^) to the values (J^^^, nj^^^"^) 
obtained by applying the maximum entropy method to 



snapshots drawn from the simulation, just as we have 
analyzed the real data. Both the strength and the range 
of the interaction given by the maximum entropy anal- 
ysis are proportional to the "microscopic" parameters 
used in the simulation (Figs [4]A and B), although the 
maximum entropy interaction range njf ®"^ is roughly 3 x 
larger than the true number of interacting neighbors, 
^sim ^g believe that this overestimation is due to the 
fact that birds (unlike spins) move through the flock, 
encountering new neighbors before losing memory of the 
earlier flight directions, and in so doing propagate infor- 
mation and correlation more effectively than if they were 
sitting on a fixed network. In other words, the maximum 
entropy model, where interactions are static by construc- 
tion, compensates the dynamical nature of the true in- 
teraction network by giving a larger effective value of Uc 
Hydrodynamic theories of flocking [6l E] provide an an- 
alytic treatment of this effect, which is essential for col- 
lective motion of large two-dimensional groups. Indeed, 
in the limit of very large flocks, this ratio between the 
microscopic range of interactions and the effective range 
recovered by maximum entropy methods could become 
arbitrarily large, but the flocks we study here seem not 
to be big enough for this effect to take over. If we use 
the "calibration" of model from Fig[4p, then the obser- 
vation of n^ = 21.6 in the real flocks (Fig[2]) suggests that 
the true interactions extend over ric = 7.8, in reasonable 
agreement with the result from [TH |20], ric = 7.0 ± 0.6 
using very different methods. 

To summarize, we have constructed the minimal 
model that is consistent with a single number charac- 
terizing the interactions among birds in a flock, the av- 
erage correlation between the flight directions of imme- 
diate neighbors. Perhaps surprisingly, this provides an 
essentially complete theory for the propagation of di- 
rectional order throughout the flock, with no free pa- 
rameters. The theory predicts major qualitative effects, 
such as the presence of long ranged, scale free correla- 
tions among pairs of birds, as well as smaller, detailed ef- 
fects such as the non-monotonic distance dependence of 
(four-point) correlations among two pairs of birds. The 
structure of the model corresponds to interactions with 
a fixed number of (topological) neighbors, rather than 
with all neighbors that fall within a certain (metric) dis- 
tance; the relevant number of neighbors and the strength 
of the interaction are remarkably constant across multi- 
ple flocking events. Our approach can be seen as part of a 
larger effort using maximum entropy methods to link the 
collective behaviour of real biological systems to theories 
grounded in statistical mechanics [2TH33] . As with these 
other examples, we view the success of our theory as an 
encouraging first step. We have focused on the flight di- 
rections, taking the positions of the birds as given. A full 
theory must connect the velocities of the birds to their 
evolving positions, which requires more accurate mea- 
surements of trajectories over time, and we must con- 
sider the fluctuations in the speed as well as direction of 
flight. There are maximum entropy approaches to both 



of these problems, and the mapping from maximum en- 
tropy models to statistical mechanics suggests that the 
observation of scale free correlations in speed fluctua- 
tions [T7 will locate models of flocking at an especially 
interesting point in their parameter space [34 . 
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Methods 

Data. Analyzed data were obtained from experi- 
ments on large flocks of starlings {Sturnus vulgaris)^ in 
the field. Using stereometric photography and innova- 
tive computer vision techniques |12l [13] the individual 
3D coordinates and velocities were measured in cohe- 
sive groups of up to 4268 individuals [TTl HH HZ] . The 
data- set comprises 21 distinct flocking events (see Table 
[l|) . Each event consists of up to 40 consecutive 3D con- 
figurations (individual positions and velocities), at time 
intervals of 1/10 s. 

Analytic approach to the maximum entropy 



model. To apply the maximum entropy analysis, we 
need to compute the expected values of correlation 
functions according to the measure defined in Eq ([l]). 
This requires the computation of the partition function 
Z({Jij}), which is, in general, a hard task. Flocks, 
however, are very ordered groups, in that birds' ve- 
locities are neatly aligned to each other [ITj. In this 
case we can use the "spin wave" approximation [18], 
which exploits the strong ordering condition. Let us 
call S = (1/^) Xli Si = S n the global order parame- 
ter, or polarization, measuring the degree of collective 
alignment, where n is a unit vector. Individual orienta- 
tions can be rewritten in terms of a longitudinal and a 
perpendicular component: Si = sfn-\-7ri. If the system is 
highly polarized, 5' ~ 1, |7fi| < 1, and sf" ^ 1 - |7fip/2; 
we verify this last condition for our data in Fig [5] of Ap- 
pendix [B] The partition function can be written as an 
integral over the {tt}, and if 5 ~ 1 the leading terms are 
(see Appendix [B] for details): 



^({^ij}) 



/ 



a Trexp 



1 ^ 



Aijifi 



N 
i,j = l 



(6) 
where Ay = "^^ Jik^ij — -^ij, d^^f = J^- diTi and the {TTi} 
satisfy the constraint ^- tti = . If we consider the flight 
directions of birds on the border as given, integration 
must be performed with respect to internal variables only 
(see Appendix [D]). After some algebra one gets 
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Here X and B represent the subsets of, respectively, in- 
ternal and border individuals; hi = X]iGi3 "^ii^i ^^ ^ 'field' 
describing the influence of birds on the border on inter- 
nal bird i; and, now. Ay = ^^(Zlfc^x ^ik + E^G^^il^^)• 
The integral in Eq ^ can be carried out explicitly; see 
Eq (D6). The reduced model in Eq Q corresponds to 

1/2, or according to whether 



Ji 



J nij, with nij 



both individuals, just one, or none, belong to the local 
nc-neighborhood of the other. Given the individual coor- 
dinates of birds in space, the matrix Ay can be computed 
for any given snapshot, and Z{J^nc]B) (and correlation 
functions) can be calculated as a function of J and Uc. 
These two parameters must then be adjusted to maxi- 
mize the log-likelihood of the data. 
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Maximizing with respect to J corresponds to equating 
expected and experimental correlations. In our case, this 
equation can be solved analytically, leading t o an e xplicit 
expression of the optimal J vs. Uc] see Eq (D12). Min- 
imization with respect to Uc can then be performed nu- 
merically. A graphical visualization of the solution can 
be found in Fig. [2] 

Self-propelled particle model. We consider a 
model of self-propelled particles extensively studied in 
the literature [9] . Each particle moves with vector veloc- 
ity Vi{t) according to the following equations: 

Vi{t^l) = voe a^iTj(t)+/3^Aj+^c^i (9) 

Xi{t ^ 1) = Xi{t) ^ Vi{t) , (10) 

where 9 is a normalization operator 6(^) = y/\y\ that 
serves to keep the speed fixed at \v\ = vq^ and j G nj. 
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means that j belongs to the Uc interacting neighbors of i. 
The distance-dependent force /y acts along the direction 
connecting i and j; following Ref [9^ 
vector between i and j, we take 

Aj(nj <n) = 
f\]{n < nj < To) = 

Kii^a < nj < ro) = 

Finally, ffi is a random unit vector, independent for each 
bird and at each moment of time. The parameters a 
and P tune the strength of the alignment and of the co- 
hesion force, respectively; in particular, the strength of 
alignment is given by J = voa/ric. To test the maximum 
entropy analysis, we modified the model in such a way 
that we could vary Uc. Specifically, we introduced an an- 
gular resolution /i such that only neighbors with mutual 
angles larger than /i were included in the neighborhood. 
When /i is of the order of the Voronoi angle the model 
is statistically equivalent to the original version (where 
Voronoi neighbors were considered), but increasing (de- 
creasing) /i one can decrease (increase) the value of ric. 
In this way both the number Uc of interacting neighbors 
and the strength of the interaction J can be arbitrarily 
tuned. Parameters were chosen as ro = 1 (to set the 
scale of distance), r^ = 0.2, rg = 0.5, r^ = 0.8, a = 35, 
/3 = 5, ^0 = 0.05, and we simulated a flock of A^ = 512 
birds. 



Appendix A: Maximum entropy approach 

The maximum entropy method has a long history. 
Recent developments in experimental methods have re- 
newed interest in this idea as a path for constructing sta- 
tistical mechanics models of biological systems directly 
from real data, with examples drawn from networks of 
neurons J2Tti26] , ensembles of amino acid sequences [27]- 
[30j , biochemical and genetic networks [STJ [32] , and the 
statistics of letters in words [33 . Here we give a review of 
the basic ideas leading to Eq Q of the main text, hoping 
to make the discussion accessible to a wider readership. 

Imagine a system whose state at any one in- 
stant of time is described by a set of variables 
{xi, X2, • • • , xn} = X. For the moment we don't need 
to specify the nature of these variables — they could be 
positions or velocities of individual birds i = 1, 2, • • • A/" 
in a flock, or more subtle parameters of body shape or 
instantaneous posture. Whatever our choice of variables, 
we know that when the number of elements in the system 
N (here, the number of birds in the flock) becomes large, 
the space x becomes exponentially larger. Thus there is 
no sense in which we can "measure" the distribution of 
states taken on by the system, because the number of 
possibilities is just too large. On the other hand, we can 
obtain reliable measurements of certain average quanti- 
ties that are related to the state x. To give a familiar 
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TABLE I: Summary of experimental data. Flocking events 
are labelled according to experimental session number and to 
the position within the session they belong to. The number 
of birds N is the number of individuals for which we obtained 
a 3D reconstruction of positions in space. The polarization S 
is defined in the Methods and in Appendix |B] The linear size 
L of the flock is defined as the maximum distance between 
two birds belonging to the flock. The speed vq is that of the 
centre of mass, i.e. the mean velocity of the group. All values 
are averaged over several snapshots during the flocking event. 



example, we can't measure the velocity of every electron 
in a piece of wire, but certainly we can measure the aver- 
age current that flows through the wire. Formally, there 
can be several such functions, /i(x), /2(x), • • • , /k(x), 
of the state x. The minimally structured distribution 
for these data is the most random distribution P(x) 
that is consistent with the observed averages of these 



functions {(/^(x))exp}, where (• 
age measured experimentally. 



•)exp denotes an aver- 



To find the "most random" distribution, we need a 
measure of randomness. Another way to say this is that 
we want the distribution P(x) to hide as much infor- 
mation about X as possible. One might worry that in- 
formation and randomness are qualitative concepts, so 
that there would be many ways to implement this idea. 
In fact. Shannon proved that there is only one measure 
of randomness or available information that is consis- 
tent with certain simple criteria [35] |36] , and this is the 



entropy 



5[P] = -^P(x)lnP(x) 



(Al) 



Thus we want to maximize S [P] subject to the con- 
straint that the expectation values computed with P 
match the experiment ahy measured ones, that is 

(^(X))e.p = (^(X))p = ^P(x)/^(x) (A2) 



for all /i \T0 . The distribution P(x) must also be nor- 
malized, and it is convenient to think of this as the state- 
ment that the average of the "function" /o(x) = 1 must 
equal the "experimental" value of 1. Our constrained 
optimization problem can be solved using the method of 
the Lagrange multipliers [19 : we introduce a generalized 
entropy function, 

K 

S [P; {A,}] = S[P]-J2\^ [(/m(x))p - {/M(x))exp] , 

/i=0 

(A3) 
where a multiplier A^ appears for each constraint to be 
satisfied, and then we maximize S with respect to the 
probability distribution P(x) and optimize it with re- 
spect to the parameters {A^^}. 

Maximizing with respect to P(x) give us 
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lnP(x)-l-^A^^(x), 

/i=0 
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exp -^A^/^(x) 



1 



^({A.}) 



(A4) 
(A5) 



where Z({A^}) = exp(— Aq — 1). Since optimizing with 
respect to Aq will enforce normalization of the distribu- 
tion, we can write, explicitly. 



Z{{X.}) = Y.exp 



K 

■EWm(x) 



(A6) 



Optimizing with respect to {\v} gives us a set of K 
simultaneous equations 







\//i(XJ;exp — 



dS [P; {K}] 

1 



^({A.}) 



(/m(x))p 
^/^(x)exp 



Thus, when we optimize S with respect to the parame- 
ters {A^} we are enforcing that the expectation values of 
the {/^(x)} agree with their experimental values, which 
is the starting point of the maximum entropy construc- 
tion . No te also that, if we substitute Eq (A5) back into 
Eq (A3), we obtain 



K 

EWp(x) 

(A7) 



K 

S [P; {A,}] = \nZ{{X,}) + ^ A^(^(x))exp , (A8) 

which is minus the log probability, or likelihood, that the 
model generates the observed data. The optimal values 
of {Ajy} correspond to minima of <S, as can be checked by 
considering the second derivatives. Therefore, the max- 
imum entropy approach also corresponds to maximizing 
the likelihood that the model in Eq (A6) generates the 
observed data. 

The maximum entropy distributions are familiar from 
statistical mechanics. Indeed we recall that a system in 
thermal equilibrium is described by a probability distri- 
bution that has the maximum possible entropy consis- 
tent with its average energy. If the system has states 
described by a variable x, and each state has an energy 
£^(x), then this equilibrium distribution is 



P(x) 



1 



Z{P) 



-pE(x) 



(A9) 



where /3 = l/Zc^P is the inverse temperature, and the 
partition function Z{I3) normalizes the distribution. 



^(/?) = Ee" 



■/3S(x) 



(AlO) 



In this view, the temperature is just a parameter we have 
to adjust so that the average value of the energy agrees 
with experiment. The fact that equilibrium statistical 
mechanics is the prototype of maximum entropy models 
encourages us to think that the maximum entropy con- 
struction defines an effective "energy" for the system. 



Comparing Eq's (A5) and (A9) gives us 



E{^) 



K 

E 



A/iZ/iW, 



(All) 



and an effective temperature ksT = 1. This is a mathe- 
matical equivalence, not an analogy, and means that we 
can carry over our intuition from decades of theoretical 
work on statistical physics. 

In this paper, we discuss the case where the pairwise 
correlations (si-5j) are measured experimentally. Thus 
we can use the general maximum entropy formulation, 
identifying x = {si} and /^(x) = s\- Sy Since the quan- 
tities that will be measured refer to pairs, it is useful to 
set A^ = — Jij, and we obtain Eq (IT]), 



P{{Si}) = 



Z{{Jii}) 



exp 



N N 



olZIZ^ij^i'^J 



i=l j = l 



(A12) 



As before, the parameters {Jij} must be adjusted so that 

(Si-5j)p = (5i-Sj)exp. 



The model defined by Eq (A12) is identical to a well 
known model for magnetism, the Heisenberg model. In 
that case, the model describes individual spins, which 
tend to mutually align according to the interactions Jij . 
In this context, the effective energy is 



N N 



E{{si}) = -l^^Ji^s,-s^. 



(A13) 



i=l j = l 



For Jij > 0, the energy is lowered when the vectors Si 
and Sj are parallel. 



where we recall that the {si} are real, three dimensional 
vectors of unit length and d^s = f|. dsi. 

In presence of strong ordering, we can use the "spin 
wave" approximation to compute analyticall y th e par- 
tition function of the Heisenberg model, Eq (Bl). Let 
us call S = {l/N) ^^ Si = Sfi the global order param- 
eter, or polarization, measuring the degree of collective 
alignment, where n is a unit vector. Individual orienta- 
tions can be rewritten in terms of a longitudinal and a 
perpendicular component with respect to n. 



Appendix B: The spin wave approximation 

The most demanding step in evaluating the probabil- 
ity distribution in Eq (A12) is the computation of the 
partition function 



Zi{Jii})=J 



a s exp 



N N 



olZIZ^iJ^i-^J 



i=l j = l 



(Bl) 



Si 



s^n- 



TTi 



(B2) 



where, by construction, ^. sf = SN, 7fi • n = 0, and 
X^i TTi = 0. The partition function then reads 



Z{{Jii}) = d^'s^ d 



I' 



.iV^ 



UHi^ 



L\2 



TTi 



*E 



TTi exp 



N N 



lEE^^i 



(^f «f 



-TTi -TTi 



(B3) 



where d^s^ = Ui^st and d^ir = Ui^^i- The delta 
functions implement the constraint on the length of each 
vector Si and the global constraint on the tt^. Note that 
since the 1^1 's belong to the subspace perpendicular to 
n, in Eq (B4) there are only two independent degrees of 



freedom for each integration variable. 
If the system is highly polarized, 5* 



1 and iTTil < 1. 



The constraint on the norm of the vectors can then be 
written as sf" ^ 1 — |7fip/2. Note that indeed flocks are 
very polarized groups and this expression is very well 
satisfied by the data (see Fig p] and Table ^. Using this 
expansion the longitudinal components can be integrated 
out easily. The partition function then becomes, to lead- 
ing order in the tt's. 



^({^ij}) 



d^T? 
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i-m 



si'^TTijexp 



1 ^ 1 ^ 



i,j=l 



i,j=l 



(B4) 



r 



with 



^ij — 2_^ JikSij Jij 



(B5) 



approximation, and we shall drop it. We have checked in 
our computations that the corrections due to this term 
are indeed negligible. 



,2 . 



The product over 1/a/1 — |7fi| in Eq (B4) is the Jaco- 

bian coming from the integration over the s^^. This term 
gives rise to sub-leading contributions in the spin wave 



The matrix A is, by construction, a positive semi- 
definite matrix. We can find eigenvalues a\^ and eigen- 
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vectors w*^ as usual through 



'^Aijwf = akwf. 



(B6) 



There is one zero eigenvalue, ai = 0, cor- 
responding to the constant eigenvector w^ = 



E^iX = ;;^E^y 



0. 



(B7) 



lated only to the projection of the {-Tfi} onto this zero 
mode. We note that in a system with translation in- 
variance, the eigenvectors are Fourier modes, or plane 
waves, and these are called spin waves in the theory of 
magnetism. The zero eigenmode is related to the spon- 
taneous breaking of symmetry when the flock chooses a 
consensus direction of flight — all directions ft are equally 
probable, a priori, and hence have equal probability or 
energy, and the zero mode is the remanent of this sym- 
metry; in physics this is the Goldstone mode. 



The argument of the delta function in Eq (B4) is re- defined by {w^}: 



We can now rewrite Eq ( B4 ) in the orthonormal basis 



/ 



Zi{Jij})= d^TT' S{7T[)exp 



1 ^ 

oE"kl 



f |2 



k=l 



JV 



E-^ii 



(B8) 



where tt^ = ^. w^ifi. This leads to 



1 ^ 
logZ({Jij}) = -^log(ak) + ^ ^ Jij , 



(B9) 



k>l 



ij = l 



where we drop constant terms independent of Jy . 

Let us now proceed, at a formal level, with the maxi- 
mum entropy approach. The parameters Jy are fixed by 
requiring that {si-Sj)p = (5i-5j)exp- If we focus on the 
perpendicular part of the correlation, this implies 



(7fi-7fj)exp = 2^ 



k>l 



h h 
dk 



(BIO) 



where the right hand side, the expectation value (7fi-7fj)p, 
can be obtained from Eq ( B8 ) using Gaussian integration 



rules, the factor 2 coming from the two independent de- 
grees of freedom of each tti. According to this equation. 




Longitudinal component s 



FIG. 5: Longitudinal components of the flight directions vs. 
prediction of the spin wave expansion, for all individuals in 
the snapshot of Fig[l] The black line has slope 1. Note that 
95% of the birds have s\ > 0.94, and lie well on the line. 



the matrix Ay — and therefore the interaction matrix 
Jij — is easily obtained by taking the inverse of the exper- 
imental perpendicular correlation function (once we take 
away the zero mode due to symmetry). But, to be in- 
vertible, the experimental correlation matrix must have 
N — 1 nonzero eigenvalues. This can only be achieved 
by performing a huge number of experiments, i.e. eval- 
uating the experimental average over a number of in- 
dependent samples larger than the number of birds in 
the flock. As discussed in the main text, the interac- 
tion network in a flock changes continuously in time, 
since individuals move and change their neighbors. But 
the average over many independent realizations of (si-s^) 
would require birds to stay still at some fixed positions, 
while updating and realigning their velocities, which is 
definitely not the case. In other terms, different experi- 
mental samples (i.e. snapshots) correspond to different 
networks Jy and cannot be averaged together. Thus, in 
our case, the maximum entropy model must be solved 
independently at each time step, for which we have only 
one experimental sample. Unfortunately, if we compute 
the correlation matrix from a single snapshot, it has rank 
two and cannot be inverted. This motivates, as discussed 
in the text, the analysis of a more restricted problem, in 
which we know only the average local correlations Cint 
from Eq ([5|. 



Appendix C: Computation Avith free boundaries 

Let us now address more in details the reduced model, 
Eq ([5| in main text, where each individual interacts with 
constant strength with its first Uc neighbors. In this case 



the Jij 's have a particularly simple form: 
Jij = J nij 



(Cl) 
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with 



1 if j ^ ^c ^^^ i ^ ^c 







if j G nj, and i ^ n^, , or vice versa, and 



otherwise. 



(C2) 
Here, J indicates the strength of the interaction and nj, 
indicates the set of the first Uc neighbors of bird i. Since 
we know the spatial coordinates of all the birds in the 
flock, once the parameter ric is fixed, we can compute 
all the neighborhoods and determine the matrix nij. In 
this case, therefore, Ay = JAy, where A depends only 
on the neighborhood relations. 

Before proceeding with the full computation, let us 
briefly look at the simplest case, where we allow all the 
TTi's to freely fluctuate according to Eq (B8). The result 



can be read directly from Eq (|B9|), giving 



log Z( J, ^c) = - ^ ^ log( JAfe) 



NJUr 



(C3) 



k>l 



where the \k are the eignevalues of A. Similarly, we can 
compute the correlation functions. 



(7fi-7fj) 



2 w^w^ 

k>l '^ 



{sl^s^} = 1 



^E 



(w^f + (wh^ 



J 



fe>i 



Afc 



(C4) 



where Xk and w^ are, again, the eigenvalues and the 
eigenvectors of the matrix A and depend only on Uc. 

To build the maximum entropy model, we need to find 
the appropriate values for J and ric- As discussed in Ap- 
pendix |AJ this amounts to maximizing the log-likelihood 
of the experimental data given the model. This can be 
written simply as 



l0gP({5i}) 



logZ{J,ne) + ^JNn,C^:^ , 



(C5) 



where, following the notation in the main text. 



ieni 



indicates the degree of correlation up to the interaction 
range ric. 

Maximizing with respect to J, which corresponds to 
setting Cint(^, ^c) = ^hit^^ gives immediately 

j = y(i-c;Lr)- (C7) 

This equation provides an explicit relationship between 
J and Uc. Substituting into Eq (C5), the likelihood be- 



comes a function of ric only, and its maximum can be 
found numerically. 



The values of J and Uc^ obtained from this computa- 
tion with free boundaries, are displayed in Fig[6J for all 
the flocking events we analyzed. They are strongly cor- 
related to what we find with fixed boundary conditions 
(see next section): the value of Uc is slightly smaller, the 
value of J slightly larger, but the product Jric is approx- 
imately the same. On the contrary, the prediction for 
the perpendicular correlation as a function of distance 
(FigpC and D) is less satisfactory: while the correlation 
length is correctly reproduced, the decay of the corre- 
lation with distance is significantly faster. Besides, the 
value of the perpendicular correlation near r = looks 
much smaller than the experimental value. To better 
understand this point we note that 



a. 



Qnt + ^ ' 



-E 



^E(l-Jl') 



(C8) 



J^nc 



where, we recall, S is the polarization. The first term 
in this decomposition of Cint represents the perpendicu- 
lar part of the correlation up to scale Uc, while the last 
term is a 'local' polarization getting contributions only 
from individuals on a scale Uc. The maximum entropy 
model, by construction, reproduces correctly the experi- 
mental value of Cint- What happens in the computation 
with free boundaries is that the model underestimates 
the contribution on short scales (n < Uc^ corresponding 
to spatial scales of a few meters) from the perpendicular 
part of the correlation, and compensates by overestimat- 
ing the polarization. The effect is more or less strong in 
different flocks, as seen in Figs[6p and D. 

As discussed in the main text, there are good reasons 
to think that birds on the edge of the flock should be 
described differently from those in the bulk; Fig |6p is 
evidence that if we ignore these differences we really do 
fail to predict correctly the correlation structure of the 
flock as a whole. 



Appendix D: Computation with fixed boundaries 

To improve our approach, we need to consider more 
appropriate boundary conditions. As discussed in the 
main text, birds on the border of the flock are likely 
to behave differently from birds in the interior of the 
flock. This occurs because they experience a different 
kind of neighborhood, part of the space around them 
being devoid of neighbors. Besides, these birds are con- 
tinuously exposed to external stimuli and their dynamics 
may be strongly influenced by environmental factors (ap- 
proaching predators, obstacles, nearby roosts, ...). Thus, 
modelling birds on the border might require taking into 
account other ingredients than the interactions between 
individuals. Rather than trying to making a model of 
these (largely unknown) factors, we can take the veloc- 
ities of these border birds as given, and ask that our 
model of interactions predict the propagation of order 
throughout the bulk of the flock. 
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FIG. 6: Computation with free boundary conditions vs computation with fixed flight directions on the border. (A) Values 
of the parameter Uc for all the flocking events; the black line is the linear regression. Error bars are standard deviations 
across multiple snapshots of the same flock. (B) Values of the parameter J for all the flocking events. Inset: The product 
Jric computed with free boundary vs. Juc computed with fixed boundary; now the slope is almost unity. (C) and (D) 
Perpendicular correlation as a function of distance for event 28-10 (as in Fig IT] N — 1246 birds) and event 32-06 {N — 809 
birds). Different symbols correspond to the correlation measured in experiments, the correlation computed with free boundary 
conditions and the one computed with fixed boundary conditions. Taking into account the flight directions of individuals on 
the border significantly improves the prediction for the correlation. 



If we consider the flight directions of birds on the bor- 
der as given, the computation of the partition function 
becomes more complicated. The starting point is analo- 
gous to Eq (Bl), but the integration must be performed 
with respect to internal variables only. It is then con- 



venient to separate, in the exponent of Eq (Bl), contri- 



butions coming from internal and external birds. Let us 
call X and B the subsets of internal and border individu- 
als, respectively. Then, in the spin wave approximation, 
we find an expression similar to Eq (B4): 



Z{{Jii}:B) 



d 77 S I /^TTi exp 



-^ Yl ^iJ^i • ^J + IZ^i • ^i + 9 Zl ^iJ + 9 Zl^^ + 9 IZ ^ij^i • ^J 



ijex 
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iGX 
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iDl) 



where 

/^i = ^ Jii^i = ^ Jii (s}"n + TTi) = hfn + hf 
\eB \eB 



(D2) 



^ij = ^ij (Z^ik + /^M - Jij ijGX (D3) 

\keT J 

Here hi is a 'field' describing the influence of birds on 



the border on internal bird i. The effect of this field 
is to align bird i with the border birds that are within 
its direct interaction neighborhood n].. Thus, when ric 
is small, this field only acts on individuals close to the 
border, while it is zero well inside the flock. We also 
note that, as compared to Eq ( |B4[ ), the matrix A is now 
defined only for internal birds and gets an additional di- 
agonal contribution coming from individuals on the bor- 
der. As a result, A has no longer a zero mode. From 
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a conceptual point of view, when we fix the direction of tional point of view, this implies that we cannot express 

motion of birds on the border, not all directions in the in a simple way the constraint on the {ttiI's as we did in 

bulk are a priori equivalent; rather, the boundary con- the case of a free boundary, 
ditions explicitly break the symmetry. From a computa- 



J 



To deal with the constraint, it is convenient to use an integral representation of the delta function 

dz 
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E-- 



Substituting into Eq (|D1|, we obtain 

d^TZ exp 
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(D5) 



where G{B) is a function of boundary variables only. We notice that all the integrals are Gaussian, and we obtain, 
finally, 
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where G(B) is written explicitly. Recall that the matrix A is only defined on internal individuals and hence the 
number of eigenvalues that contribute to the computation of det(74) is given by the number of internal birds. In the 
same way, we can easily compute correlation functions. We find 



/^\ \^( A-^\ ?P EjGx(^ )ij 

(^i) = 2^(^ )ij^i - ^^ M^Tv 



les kjei 



and 



(7?i-7rj) = {7fi)-(7rj) + 2 



(^-')ij 



Ek„ei(^-^)ik(^-^) 



nj 



Eunex(^-^) 



kn 



(D7) 



(D8) 



r 



At this point, to solve the maximum entropy model for 
the reduced case, we simply substitute the parametriza- 
tion Jij = Jriiy The log-likelihood takes the form 



l0gP({5i}) 



exp 



= - log Z{J, rie; B) + -Jn.NCt^^ . 

(D9) 



with Z{J^nc]B) as in Eq (Dl). To find the optimal 



value for the parameters J and Uc we need to maximize 
the likelihood. Maximization with respect to J again is 
equivalent to matching the predicted correlations to the 
experimental ones, Cint{J^nc;B) = C^^^ . This equation 



is represented graphically in Fig[2]A. in the main text. It 
is worth noting that, as in the case with free boundary 
conditions, it is possible to solve this equation analyti- 
cally. We can define 



A 






(DIO) 
(Dll) 



both of which are independent of J, and then, after some 
algebra, we obtain 



(TVin - 1) _ 1 



p p 



J 



^Y.{A-%h,-h, 2 
ijex 



Z.P 



EieB^i + Eijei(^-')iJ^ 



Eijex(^-^)i. 



-Y.h\+Y.^^rnSyS^ + ^{l-Ct:^) (D12) 
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where A^in is the number of internal birds. Note that the 
right hand side is a function of ric only, so we hav e an 
expression for J{nc]B). Substituting back into Eq (D9) 



we get the likelihood as a function of Uc only. Maximiza- 
tion can be performed numerically, as shown in Fig|2^ 
in the main text. 

Values of J and Uc for all flocks are collected in Figs|2] 
in the main text and in Fig[6J In this figure, we see the 
improvement in the prediction of the correlation function 
C{r) that comes with fixed boundary conditions. 



Appendix E: A global model 

Given a fiock of birds, so far we have solved the max- 
imum entropy model for each individual snapshot inde- 
pendently, and then we have averaged the inferred val- 
ues of the parameters J and Uc over all the snapshots. 
This is the most general procedure we can use, consistent 
with the dynamical nature of the interaction network. 
The inferred values of J and ric fiuctuate from snapshot 
to snapshot, due to several factors. It is possible that 
birds slightly adjust interaction strength and range dur- 
ing time, but there are other noisy contributions that 
might increase the fiuct nations. The fiocks we analyzed 
are finite groups, ranging from a few hundreds to a few 
thousands individuals, and we therefore expect finite size 
effects. The algorithmic procedure to reconstruct posi- 
tions and velocities of individual birds in the fiock is very 
efficient but not perfect, and there are fluctuations across 
snapshots in the number of reconstructed individuals; see 
Refs [m [17] for details on the 3D reconstructions. Fi- 
nally, the log-likelihood can be very flat in the region of 
the maximum: in this case even small fluctuations can 
cause the value of the maximum to jump from a value 
of Uc to another one quite different. Averaging ric and 
J over the snapshots, we get rid of these ffuctuations. 
Alternatively, we can assume from the start that, given 
a flock, there is a unique value of Uc and J through time. 
In this case, the log-likelihood of each snapshot is a func- 
tion of the same J and Uc and we need to optimize the 
global likelihood corresponding to all the snapshots, and 
not each one independently. In other terms, we flrst com- 
pute the average of the log-likelihood over the snapshots 
at J and ric flxed, and then we maximize with respect 
to the two parameters. Note that we are inverting the 
procedure described in the previous sections, where, on 
the contrary, we flrst maximize each individual snapshot 
with respect to J and ric and then we take the average 
over all the snapshots of the optimal parameters. The 
computation of the average log-likelihood can be easily 
done starting from the equations for the single snapshot. 
Let us denote, for future convenience, by 

^a{J.nc) = -logZ(J,n,;S«) + ^Jn,7VCf,"^^ (El) 

the log-likelihoo d of the snapshot a with parameters J 
and ric (see Eq (|D9|)). Then, the average log-likelihood 



for all the snapshots is 

*^global(^, ric) = -V7 Yl ^"(^' ^c) ' (^^) 



where A/gnap is the number of snapshots available for that 
flock. At this point, we need to maximize <3^giobai over J 
and ric. The maximization with respect to J leads, once 
again, to an explicit expression for the optimal J, that 
we shall call Jgiobab as a function of ric'. 



1 



1 



(^i? 



1) 



1 



^global 



^snap 



{Nl 



;lobal 



1) Ja{nc) 



(E3) 



where Jai^c) is the optimal value of J for the snapshot 
a, as above, N^ is the number of internal individuals 
in the snapshot a, and A^^° ^ is the averag e ov er snap- 
shots. Substituting Jgiobai(^c) back in Eq ( |E2[ ) we get 
an expression, which is a function of ric only. The like- 
lihood can then be maximized numerically with respect 
to ric. The values n^giobai and Jgiobai obtained in this 
way are plotted in Fig [7| where they are compared to 
the values inferred with the more general procedure (op- 
timizing each snapshot independently and then averag- 
ing). There is a very strong correlation with slope close 
to one. This represents a strong consistency check on 
the inference procedure. 





A 








40 




«30 






^ 


'^^^ 


U) 


1 — ' — •- 


^^^p 




£="20 


^ 
^ 


_^i<,„ 


=^ 














' 


PVV^ 











20 30 




FIG. 7: Global models of flocking events. (A) Values of 
the neighborhood size ric inferred from maximizing the log- 
likelihood averaged over snapshots, plotted vs. the mean 
values obtained from maximizing log-likelihood in individual 
snapshots. Error bars represent the standard deviation over 
snapshots for each flock. Black line has a slope 0.78. (B) As 
in (A), but for the interaction strength J; the black line has 
slope 0.92. 



Appendix F: A model for order propagation 

The maximum entropy model with flxed flight direc- 
tions on the border gives excellent predictions for two- 
point and higher order correlation functions; see Figs[3J 
|6]and|8] In addition, it allows to infer — up to a calibra- 
tion factor — the microscopic interactions in a numerical 
model of self-propelled particles. We can conclude that 
this model indeed offers a very good statistical descrip- 
tion of the flight directions of individuals in a flock. Let 
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us then look back at the model, and try to understand 
the kind of system that the model describes. 

We recall that, in this version of the model, we take 
as fixed the flight directions of the individuals on the 
border. Therefore, the model does not aim at predict- 
ing properties of border individuals, which, as we noted, 
may depend on factors other than mutual interactions. 
Rather, the model focuses on internal individuals and 
how ordering flows through the flock. The state of the 
birds on the border generate a 'fleld' {hi) on internal in- 
dividuals, but this fleld is nonzero only for individuals 
interacting directly interacting with birds on the bound- 
ary (i.e. when Jij = Jnij ^ 0). For the values of Uc 
retrieved by the model {uc ~ 20), this is only a small 
shell close to the border: all individuals well inside the 
flock, on the contrary, do not experience any direct in- 
fluence from the border. 



components right. Equations (D7) and (D8) therefore 
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FIG. 8: Correlations in the interior of the flock. (A) Perpen- 
dicular component of the two-point correlation function (as 
in Fig pK ) for internal birds only, as a function of distance. 
(B) Connected correlation function predicted in the model, 
as a function of distance. Inset: correlation length vs. flock 
size, for all the flocks that we analyzed. 



Still, if the model does describe what happens in a 
real flock, it must predict collective coherence: all flight 
directions must strongly align and internal individuals 
must behave very much in unison with their exterior 
companions. Does the model reproduce this behaviour? 
If so, what is the mechanism leading to this kind of or- 
dering? How do individuals on the border transmit infor- 
mation about their flight directions to distant individuals 
with whom they do not interact directly? 

Th e fo rmal answer to these questions can be read in 
Eq's (IDTI and (D8). The flrst equation indicates that the 



model predicts a well deflned perpendicular component 
of the flight direction (yfi) for each internal individual 
i. Surprisingly, these perpendicular components agree 
remarkably well with the ones measured experimentally 
(see Fig [Sp in main text), not only for birds close to 
boundary, but also well inside the group. The second 
equation provides a prediction for the correlation func- 
tion. Visualization of these correlations as a function 
of distance shows that these predictions also are very 
good. We note that, since the longitudinal component 
of the flight direction is given by (sf) = 1 — 0.5(|7fip), 
if we are getting the perpendicular components of the 
velocity right we must also be getting the longitudinal 



provide correct predictions of the full flight directions 
for all individuals in the flock. 

The mechanism through which such ordering occurs, 
is the presence of long ranged correlations in the system. 
This can be seen more easily rewriting the equations in 
the following way: 



(^i) 









(^i-^j) = q°" + (^i)-(^j) 



qon ^ 2 



ZlknGX Ak ^ 



nj 



Z^knGX ^kn 



(F2) 



(F3) 



where we have separated the part of the correlation, 
which is locally connected (i.e. the covariance). 
In Eq (Fl) the flrst term describes a contribution 



/^con 



coming from individuals on the border, while the sec- 
ond term is just a renormalization factor to ensure that 

XliGx('^i) + ^leB^^ ~ ^- ^^ ^^^ ^^^ from Eq (Fl) that 
an individual i far from the border can also feel the ef- 
fect of birds on the border, provided there is a nonzero 
connected correlation C^j^" between i and some individ- 
ual j close to the border. In other terms, while direct 
mutual alignment occurs only between border individu- 
als and immediate neighbors (for which hi are non zero), 
effective alignment occurs also with internal birds that 
are indirectly correlated with them (for which C^°"/^j are 
nonzero). If the connected correlations extend over suf- 
flciently long distances, this mechanism ensures propa- 
gation of directional information trough the whole flock. 
In Fig|8^ we show the behaviour of the connected cor- 
relation as a function of distance, for one flocking event. 
The scale over which this function decays, the correla- 
tion length, is of the order of the thickness of the flock 
(maximum distance between an internal point and the 
border) , showing that C^°" indeed is long ranged enough 
to propagate ordering well inside the group. In the in- 
set, we show that the correlation length grows linearly 
with flock size, for all the flocking events we analyzed. 
Thus the correlation function is scale free: no matter how 
large the flock is, the correlation always extends over the 
whole flock. 



Appendix G: Correlation functions 

In this section we summarize the deflnitions of all the 
correlation functions introduced in the paper and we 
comment on their behaviour. 

The pairwise correlation. Let us start by recalling 
the deflnition of the pairwise correlation. 



(5i.5j) = (5p5h + (?i.?j) 



(Gl) 
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where, for future convenience, we have separated the lon- 
gitudinal part of the correlation from the perpendicular 
one. We note that while the sample average of the per- 
pendicular flight direction is zero, {1/N) XI i '^i = 0^ ^^^ 
same is not true for the longitudinal direction. Rather, 
we have (l/N) "^^sf = S^ and the longitudinal corre- 
lation is dominated by a contribution from the global 
polarization S. To better investigate the degree of cor- 
relation in the system, it is then convenient to focus on 
fluctuations of the individual flight directions with re- 
spect to the sample average. To this end, in all the fig- 
ures in this paper we consider the following correlations, 
where we have subtracted the sample average contribu- 
tion: 



a 



ci 



(Tfi-TTj) 
((1- 



(G2) 



S){s\ 



S)) 



C'ij — Qj + Qj 



;(7rf))(l-5--(7r2)) (G3) 



(G4) 



The last identity in Eq ( G3 ) is a consequence of the spin 
wave approximation. 

Connected correlations. In Appendix |D] we have 
described a theory where we get nonzero expectation val- 
ues for the flight directions of individual birds, (tti) 7^ 0. 
In this case, it may be useful to look at correlation func- 
tions which are locally connected, i.e. that describe how 
the individual bird flight direction fluctuates with re- 
spect to its own average value and not — as in the pre- 
viously defined correlations — with respect to the sample 
average. To this end, we have introduced in Appendix 
[Fjthe following connected correlation function 



/<-YCon 



(7fi-7fj) - (yfi) • (ttj) 



(G5) 



We note that in our case Cy°" is purely a theoretical 
construct. Indeed, we have applied the maximum en- 
tropy approach to each single snapshot independently. 
For a single snapshot, the experimental measurement of 
the correlation only consists in one configuration (the 
velocity field at that instant of time) and we cannot dis- 
tinguish between connected and non-connected correla- 
tions. The only quantity that can be compared between 
theory and experiments is therefore (tti • ttj ) . 

The degree of direct correlation. One important 
quantity entering our computation is the degree of direct 
correlation, 



Ci, 



vE;rE(^"^i-^1)' 



N 



(06) 



j^nj^ 



which measures the average correlation between an in- 
dividual and its first Uc neighbors. This degree of di- 
rect correlation is a single scalar quantity, and represents 




FIG. 9: Sketch of the structure of the four point correlation 
function. Red circles represent birds. Birds i and j have 
mutual distance rij = ri; birds k and 1 also have mutual 
distance rki = ri . The distance between the mid-point of the 
ij pair and the midpoint of the kl pair is r2 . 



the input observable used by our maximum entropy ap- 
proach to build a statistical model for the flight direc- 
tions. 

The two— point correlation function. To describe 
the behaviour of the correlation at different scales, it is 
convenient to define the two-point correlation function 



C{r) 



Eij '^(^ij - r) 



(G7) 



where r^ 



Fi— Fj I is the distance between bird i and bird 
j and the delta function selects pairs of individuals that 
have mutual distances in a small interval around r (the 
denominator representing the number of pairs in such an 
interval). This function measures the average degree of 
correlation between individuals separated by a distance 
r. Again it is possible to distinguish a longitudinal and 
a perpendicular component of these correlations. 



C(r)=CL(r) + CP(r), 



(08) 



describing the contributions relative to, respectively, lon- 
gitudinal and perpendicular fluctuations in the flight 
directions. Figure [3] in the main text and Fig [8] in 
the Appendix show the two-point correlation function 
computed from the maximum entropy model with fixed 
boundary conditions. The prediction agrees nicely with 
the experimental one, on all scales. We stress that the 
maximum entropy model uses as an input only Cint, 
which measures the average degree of correlation up to 
scale Tie. With the values of ric retrieved for our events 
{uc = 21.2 ± 1.7), this corresponds to a scale of the or- 
der of a few meters in r. In contrast, the two-point 
correlation function measures the correlation on all pos- 
sible scales, from close neighbors (a few meters) to the 
whole extension of the flock (hundreds of meters, for 
some flocks). Therefore, the good agreement with exper- 
iments represents a highly nontrivial prediction of the 
model. From Eq (07), the correlation function takes 



into account the contribution coming from all pairs of 
individuals, independent of whether they reside on the 
border or in the bulk of the flock. Yet, when adopting 
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fixed flight directions on tiie border of tiie flock, ttie con- 
tribution coming from birds on tiie border is by construc- 
tion identical in the predicted and observed correlation 
functions. To test more explicitly whether the model 
provides good predictions for the correlations of inter- 
nal individuals, we can consider an internal correlation 



function, deflned as in Eq (G7), but where we only count 



contributions from individuals inside the flock (i, j G X); 
the result is in Fig[8]A.. Again, the prediction of the model 
is nicely consistent with the experimental correlation. 

The four— point correlation function. We can de- 
flne correlation functions not only between pairs of indi- 
viduals, but for more complicated arrangements of birds. 
For example, let us consider a pair of birds i, j separated 
by a distance ri, and measure their mutual alignment. 
We might want to compare this degree of alignment to 
the one that another pair of birds k, 1, also separated 
from one another by a distance ri, that are located in 
another position in the flock. We can then deflne the 



following four-point correlation 

Eijki((^i-^j)(^i-^j))^ijki 



C4(ri;r2) 
Aijki 



E 



ijkl ^ijkl 



(G9) 



(5(rij - ri)(5(rki - ri)(^(rij_ki - r2) 

(GIO) 



where ry-ki indicates the distance between the mid- 
points of the pairs ij and pair kl; see Fig [9J We can 
plot C4(ri; r2) as a function of the two distances ri and 
r2. For example, in Fig|3p in the main text, it is shown 
for a flxed value of ri as a function of r2 . We also note 
that, in the spin wave approximation, the longitudinal 
correlation C^ is nothing else than a particular case of 
the four-point correlation. 



C^{r) = l-C^{0',r)-S^ 



(Gil) 
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